Extracting macroscopic information from Web links

نویسنده

  • Mike Thelwall
چکیده

Much has been written about the potential and pitfalls of macroscopic web-based link analysis, yet there have been no studies that have provided clear statistical evidence that any of the proposed calculations can produce results over large areas of the web that correlate with phenomena external to the Internet. This article attempts to provide such evidence through an evaluation of Ingwersen’s (1998) proposed external Web Impact Factor (WIF) for the original use of the web: the interlinking of academic research. In particular, it studies the case of the relationship between academic hyperlinks and research activity for universities in Britain, a country chosen for its variety of institutions and the existence of an official government rating exercise for research. After reviewing the numerous reasons why link counts may be unreliable, it demonstrates that four different WIFs do, in fact, correlate with the conventional academic research measures. The WIF delivering the greatest correlation with research rankings was the ratio of web pages with links pointing at research-based pages to faculty numbers. The scarcity of links to electronic academic papers in the data set suggests that, in contrast to citation analysis, this WIF is measuring the reputations of universities and their scholars, rather than the quality of their publications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Towards Intelligent Information Retrieval on Web

The World Wide Web is an information resource with virtually unlimited potential. However, this potential is relatively untapped because it is difficult for machines to process and integrate this information meaningfully and today the WWW links more than 15 billion pages. The retrieval of relevant information on web is an issue that is of main concern. As the internet grew and became popular, m...

متن کامل

Semi-Structured File Analysis for Information Integration

This paper describes a PostScript file analyzer for extracting information from Web PostScript documents. Our motivation for studying this problem is the building of an informationintegration system. The information extracted from these semi-structured files can be used to model the contents of Web information sources and to define semantic links between items of information. Extracted informat...

متن کامل

Extracting knowledge from the World Wide Web.

The World Wide Web provides a unprecedented opportunity to automatically analyze a large sample of interests and activity in the world. We discuss methods for extracting knowledge from the web by randomly sampling and analyzing hosts and pages, and by analyzing the link structure of the web and how links accumulate over time. A variety of interesting and valuable information can be extracted, s...

متن کامل

An Automated Algorithm for Extracting Website Skeleton

The huge amount of information available on the Web has attracted many research efforts into developing wrappers that extract data from webpages. However, as most of the systems for generating wrappers focus on extracting data at page-level, data extraction at site-level remains a manual or semiautomatic process. In this paper, we study the problem of extracting website skeleton, i.e. extractin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2001